CorpusReader: designing and querying multi-layer corpora

نویسنده

  • Sylvain Loiseau
چکیده

CorpusReader is a framework for creating and querying multi-layer corpora, which contain several levels of analysis (morphology, syntax, semantics, etc.) and which are aimed at observing correlations between these levels. Building, representing and querying multi-layer corpora is complex. CorpusReader’s specificity essentially lies in merging the outputs of existing corpus analysis tools, avoiding the problem of integrating them at the software level. MOTS-CLÉS : corpus multiannotés, linguistique quantitative, linguistique de corpus, XML, graphes d’annotation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Formalising Multi-layer Corpora in OWL DL - Lexicon Modelling, Querying and Consistency Control

We present a general approach to formally modelling corpora with multi-layered annotation, thereby inducing a lexicon model in a typed logical representation language, OWL DL. This model can be interpreted as a graph structure that offers flexible querying functionality beyond current XML-based query languages and powerful methods for consistency control. We illustrate our approach by applying ...

متن کامل

Querying Annotated Speech Corpora

This paper is concerned with querying annotated speech corpora. A growing number of such corpora is currently being created worldwide; however, their usefulness for a wider research community is restricted by the lack of standard tools for creating, editing, annotating, storing and querying them. Two solutions for these problems are presented here: the XML-based data format TASX for corpus crea...

متن کامل

GenoQuery: a new querying module for functional annotation in a genomic warehouse

MOTIVATION We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these dat...

متن کامل

Developing a BIM-based Spatial Ontology for Semantic Querying of 3D Property Information

With the growing dominance of complex and multi-level urban structures, current cadastral systems, which are often developed based on 2D representations, are not capable of providing unambiguous spatial information about urban properties. Therefore, the concept of 3D cadastre is proposed to support 3D digital representation of land and properties and facilitate the communication of legal owners...

متن کامل

Querying Multi-Layer Annotation and Alignment in Translation Corpora

When dealing with linguistically annotated and aligned corpora current research concentrates mainly on the investigation of translation properties. However, annotated and aligned corpora can be useful for practical translation as well, since translators also work with parallel corpora. Translators typically use raw sentence aligned corpora stored in translation memories. In this paper we will s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • TAL

دوره 49  شماره 

صفحات  -

تاریخ انتشار 2008